Team Member:
Thu Tran: I have extensive experience as a web developer from earlier in my career. Recently, I graduated from Drexel University with a degree in Chemical Engineering, and I am currently pursuing a Master's degree in Data Science. Throughout my diverse career, I have developed a robust skill set that allows me to contribute significantly in various roles. My experiences have enabled me to effectively manage projects, conduct in-depth research, process and analyze data, and create insightful visualizations. This unique combination of skills from different fields allows me to approach problems from multiple perspectives and deliver comprehensive solutions.
Joe Novak: I hold a bachelors degree in design and am currently working as an associate data science analyst at a large healthcare institute supporting computation work of Principal Investigators in the domain of Lymphoma cancers. I have experience with large-cohort patient clinical data, whole-exome sequencing (WES) data, and RNAseq data. Data manipulation, hypothesis testing, and data visualization are all concepts which I would be comfortable using. At the same time, I'm enrolled in Drexel's Online Master of Data Science program. From this project I'd like to learn more about the data analysis process and how to communicate results to an audience.
The topic we pick is weather analysis. We created some brainstoming questions what, how, why, where, when... to know what we need to analyze:
Where we can find free datasource: We found some datasource that provide current weather and historical weather :
The National Oceanic and Atmospheric Association (NOAA) is an agency within the US Department of Commerce focused on keeping the public informed on current climate conditions. NOAA is rigourously committed to scientific integrity and oversees an umbrella of smaller focused agencies
National Weather Service (NWS) is an agency within NOAA and maintains an API to provide current weather : https://api.weather.gov
The Applied Climate Information System (ACIS) is maintained and developed by NOAA's
Regional Climate Centers (RCC). This API provides access to historical weather data: http://data.rcc-acis.org
NOAA provides Philadelphia monthly maximum/average/minimum temperature from 1895 to 2023 through ACIS API at: https://www.weather.gov/wrh/Climate?wfo=phi
How we extract data from that dataset: We use Python to make requests of API to source JSON format data and download CSV files from NOAA. In the previous phase, data from OpenWeatherMap's API was used. The system only allowed 1000 daily requests, creating a challenging to analyze and show the data. Therefore, the analysis was lifted over to NOAA datasets. The system restrictions are more relaxed. Ultimately, this change results in a more reliable tool.
How do we know the accuracy for our extracting dataset: API from NOAA is the best accurate information maintained by the US government. The agency has established an extensive library and research department meriting observations. After analysis and investigation, we could compare our predicted model and NOAA data point.
What specific data do we need for analysis: Airport International Civil Aviation Organization (ICAO) airport identifiers. Monthly average temperature, current temperature, latitude and longitude, and US state code will be necessary for each location.
As a proxy for statewide values, we will be using observations from major airports in each state, prioritizing capital cities and international airports. ICAO identifiers will be used to query each location. The monthly average temperature is computed as the mean of average temperature for each calendar month. To observe longer-term trends we will search for data from 1895 to the present.
What specific aspect of data visualization:
- Current temperature for each state in US
- Temperature trend for each state from 1895 to 2023
- Temperature trend for monthly Philadelphia from 1895 to 2023
- Temperature trend for yealy Philadelphia from 1895 to 2150
How we can use the specific data to generate predicted topic: Base on the dataset trend, we can predict the overall temperature for each state in the the US. We also predict the monthly/yearly temperature for Philadelphia.
What is limit or future research: We want to investigate 100 years of climate change in Philadelphia and the other big cities in the US. However, API costs some money for accessibility. This project is beneficial for all industrial manufacturing companies. We wish we had enough time and an unlimited dataset to calculate and predict for all city and US states.
import requests
import urllib.request
import urllib.error
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import seaborn as sns
import json
from scipy import stats
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.arima.model import ARIMA
import datetime
import warnings
warnings.filterwarnings('ignore')
Using NWSs current weather API and ACISs historical API, we will breifly explore past and present climate data in the US.
Airport proxies for state analysis
Our goal is to investigate statewide and countrywide temperature patterns across the US. For this analysis we need monthly average temperature for each state. To capture a true state average is not feasible, as weather sensors are spaced across states, providing a granular picture of reality, and is beyond the scope of this project. So, we decided to use one major airport from each state as a proxy or pseudotemperature representing the political boundary. Here are the list of ICAO identifiers for each airport chosen:
ports=[["AL","KBHM"],["AK","PAJN"],["AZ","KPHX"],["AR","KLIT"],["CA","KSMF"],["CO","KDEN"],["CT","KBDL"],
["DE","KILG"],["FL","KTLH"],["GA","KATL"],["HI","PHNL"],["ID","KBOI"],["IL","KORD"],["IN","KIND"],
["IA","KDSM"],["KS","KICT"],["KY","KLEX"],["LA","KMSY"],["ME","KBGR"],["MD","KBWI"],["MA","KBOS"],
["MI","KLAN"],["MN","KMSP"],["MS","KJAN"],["MO","KSTL"],["MT","KBIL"],["NE","KLNK"],["NV","KLAS"],
["NH","KPSM"],["NJ","KTTN"],["NM","KABQ"],["NY","KALB"],["NC","KRDU"],["ND","KBIS"],["OH","KCMH"],
["OK","KTUL"],["OR","KPDX"],["PA","KMDT"],["RH","KPVD"],["SC","KCAE"],["SD","KPIR"],["TN","KBNA"],
["TX","KAUS"],["UT","KSLC"],["VT","KBTV"],["VA","KRIC"],["WA","KSEA"],["WV","KCRW"],["WI","KMSN"],
["WY","KCPR"]]
In this section we demonstrate the use of api.weather.gov to fetch current weather obervations for a location.
getAirportDataCurrent() requires a list of list as input. The input provides state and airport information necessary to retreive current data.
example:
ports = [["PA","KMDT"],["MN","KMSP"]]
getAirportDataCurrent(airports=ports)
Will return current temperature data for Harrisburg International Airport and Minneapolis–Saint Paul International Airport which we use to represent each respective state
#NOAA API
#authorization for API use
my_headers = {'User-Agent' : 'jn875@drexel.edu'}
def getAirportDataCurrent(airports=[]):
map_data = []
for i in airports:
observation_request = f"https://api.weather.gov/stations/{i[1]}/observations/latest"
#station_request = f"https://api.weather.gov/stations/{i[1]}"
observation_data = requests.get(observation_request,headers=my_headers).json()
#station_data = requests.get(station_request,headers=my_headers).json()
#if there is no reading, leave it as an NA value
if(observation_data["properties"]["temperature"]["value"] is None):
temp = observation_data["properties"]["temperature"]["value"]
#if there is a current reading, transform it to Fahrenheit
else:
temp =observation_data["properties"]["temperature"]["value"]*1.8+32
map_data.append([i[0],temp,observation_data["geometry"]["coordinates"][0],
observation_data["geometry"]["coordinates"][1],
observation_data['properties']['timestamp']])
return map_data
NWS current API response provides rich data for each location. Each observation is generated by FAA certified Automated Weather Observation System (AWOS) station and comes from sensors at the location. Since this work is funded by public tax dollars the agency is obligated to make data available. This results in very organized and detailed weather data used by weather professionals daily.
"geometry" key provides location in latitude and longitude
"properties" key provides many relevant current values:
Here we demonstrate the retrival of current temperature in Pennsylvania
#Using this API users can generate useful queries for specific locations
observation_request = f"https://api.weather.gov/stations/KMDT/observations/latest"
observation_data = requests.get(observation_request,headers=my_headers).json()
print(json.dumps(observation_data,indent=2))
print("Latest temperature at KMDT", observation_data['properties']['temperature']['value'],"C")
{
"@context": [
"https://geojson.org/geojson-ld/geojson-context.jsonld",
{
"@version": "1.1",
"wx": "https://api.weather.gov/ontology#",
"s": "https://schema.org/",
"geo": "http://www.opengis.net/ont/geosparql#",
"unit": "http://codes.wmo.int/common/unit/",
"@vocab": "https://api.weather.gov/ontology#",
"geometry": {
"@id": "s:GeoCoordinates",
"@type": "geo:wktLiteral"
},
"city": "s:addressLocality",
"state": "s:addressRegion",
"distance": {
"@id": "s:Distance",
"@type": "s:QuantitativeValue"
},
"bearing": {
"@type": "s:QuantitativeValue"
},
"value": {
"@id": "s:value"
},
"unitCode": {
"@id": "s:unitCode",
"@type": "@id"
},
"forecastOffice": {
"@type": "@id"
},
"forecastGridData": {
"@type": "@id"
},
"publicZone": {
"@type": "@id"
},
"county": {
"@type": "@id"
}
}
],
"id": "https://api.weather.gov/stations/KMDT/observations/2024-06-08T19:56:00+00:00",
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
-76.77,
40.2
]
},
"properties": {
"@id": "https://api.weather.gov/stations/KMDT/observations/2024-06-08T19:56:00+00:00",
"@type": "wx:ObservationStation",
"elevation": {
"unitCode": "wmoUnit:m",
"value": 94
},
"station": "https://api.weather.gov/stations/KMDT",
"timestamp": "2024-06-08T19:56:00+00:00",
"rawMessage": "KMDT 081956Z 31013G20KT 10SM SCT060 26/11 A2981 RMK AO2 SLP093 T02560106",
"textDescription": "Partly Cloudy",
"icon": "https://api.weather.gov/icons/land/day/sct?size=medium",
"presentWeather": [],
"temperature": {
"unitCode": "wmoUnit:degC",
"value": 25.6,
"qualityControl": "V"
},
"dewpoint": {
"unitCode": "wmoUnit:degC",
"value": 10.6,
"qualityControl": "V"
},
"windDirection": {
"unitCode": "wmoUnit:degree_(angle)",
"value": null,
"qualityControl": "Z"
},
"windSpeed": {
"unitCode": "wmoUnit:km_h-1",
"value": null,
"qualityControl": "Z"
},
"windGust": {
"unitCode": "wmoUnit:km_h-1",
"value": null,
"qualityControl": "Z"
},
"barometricPressure": {
"unitCode": "wmoUnit:Pa",
"value": 100950,
"qualityControl": "V"
},
"seaLevelPressure": {
"unitCode": "wmoUnit:Pa",
"value": 100930,
"qualityControl": "V"
},
"visibility": {
"unitCode": "wmoUnit:m",
"value": 16090,
"qualityControl": "C"
},
"maxTemperatureLast24Hours": {
"unitCode": "wmoUnit:degC",
"value": null
},
"minTemperatureLast24Hours": {
"unitCode": "wmoUnit:degC",
"value": null
},
"precipitationLastHour": {
"unitCode": "wmoUnit:mm",
"value": null,
"qualityControl": "Z"
},
"precipitationLast3Hours": {
"unitCode": "wmoUnit:mm",
"value": null,
"qualityControl": "Z"
},
"precipitationLast6Hours": {
"unitCode": "wmoUnit:mm",
"value": null,
"qualityControl": "Z"
},
"relativeHumidity": {
"unitCode": "wmoUnit:percent",
"value": 38.945937790629,
"qualityControl": "V"
},
"windChill": {
"unitCode": "wmoUnit:degC",
"value": null,
"qualityControl": "V"
},
"heatIndex": {
"unitCode": "wmoUnit:degC",
"value": 25.232477264533333,
"qualityControl": "V"
},
"cloudLayers": [
{
"base": {
"unitCode": "wmoUnit:m",
"value": 1830
},
"amount": "SCT"
}
]
}
}
Latest temperature at KMDT 25.6 C
As an additional dataset we are looking at temperature histories from each state during the time period 1895-01-01 to 2023-08-31. Using the ACIS API, we will query the dataset of average temperatures in monthly intervals. This is the historical temperature data we can use for future weather pattern prediction. In this section we demonstrate the use of the ACIS API to query historical weather obervations for a location.
Two functions makeRequest(url,params) and getAirportDataHistorical(airports) are defined for requesting data from the ACIS API.
makeRequest() uses the base url and specific API paremeters to make a request for a location. Once the request is made the function loads and returns the request in the json format.
getAirportDataHistorical() implements makeRequest to build a dataset for multiple locations and returns a nested list. This function contains a dictionary of parameters for the API. Those parameters request monthly average temperature from 1895 to 2023
#Function request to acis database
def makeRequest(url,params) :
req = urllib.request.Request(url,
json.dumps(params).encode('utf-8'),
{"Content-Type":"application/json"})
response = urllib.request.urlopen(req)
try:
response = urllib.request.urlopen(req)
return json.loads(response.read())
except urllib.error.HTTPError as error:
if error.code == 400 : print(error.msg)
base_url = "http://data.rcc-acis.org/"
#parameters for the monthly average temperature of location from 1895-01-01 to 2023-7-31
def getAirportDataHistorical(airports=[]):
outputs=list()
for i in airports:
input={"sid":i[1],"edate":"2023-07-31","elems":[{"name":"avgt","interval":"mly","reduce":"mean"}],
"sdate":"1895-01-01",'output':'json'}
temp=makeRequest(base_url+"StnData",input)
outputs.append([temp])
return outputs
In this example we see a snippet of retreived data for MSP airport. Data is missing until April 1938 when this location started collecting observations. "M" values were transformed to nan and removed when conducting subsequent analysis.
getAirportDataHistorical([["MN","KMSP"]])[0][0]['data'][515:525]
[['1937-12', 'M'], ['1938-01', 'M'], ['1938-02', 'M'], ['1938-03', 'M'], ['1938-04', '46.00'], ['1938-05', '56.71'], ['1938-06', '67.90'], ['1938-07', '73.44'], ['1938-08', '73.82'], ['1938-09', '62.22']]
For the Philadelphia data, we have three datasets downloaded from NOAA via CSV files, covering maximum temperature, minimum temperature, and average temperature. Here, we focus on the "maximum Philadelphia Temperature" dataset. This historical temperature data is used for comparison with API data. The dataset comprises 14 columns: the year, followed by January through December. The temperature is measured in Fahrenheit (°F). The first column represents a time series from 1895 to 2023, with the dataset having dimensions of 129 rows and 12 columns.
#from google.colab import drive
#drive.mount('/content/drive')
#max_data_Phila = pd.read_csv("/content/drive/MyDrive/Jupiter/project-2/maxTempPhila.csv",index_col=0)
max_data_Phila = pd.read_csv("./maxTempPhila.csv",index_col=0)
#historical_maximum_data_Phila
print(max_data_Phila.shape)
print(max_data_Phila.head())
(129, 12)
jan feb mar apr may jun jul aug sep oct nov dec
year
1895 55 53 64 84 94 97 94 98 97.0 74.0 74.0 66.0
1896 53 60 63 93 93 91 93 97 92.0 74.0 73.0 60.0
1897 61 53 68 85 82 91 94 90 96.0 88.0 71.0 63.0
1898 58 61 72 78 91 94 100 93 96.0 83.0 66.0 60.0
1899 56 57 70 80 89 97 96 95 87.0 80.0 65.0 65.0
This minimum temperature dataset has the same structure as the maximum temperature dataset. It spans from 1895 to 2023, with the first column representing the years. The other columns indicate the minimum temperature for each month of the corresponding year.
#min_data_Phila = pd.read_csv("/content/drive/MyDrive/Jupiter/project-2/minTempPhila.csv",index_col=0)
min_data_Phila = pd.read_csv("./minTempPhila.csv",index_col=0)
#historical_minimum_data_Phila
print(min_data_Phila.head())
jan feb mar apr may jun jul aug sep oct nov dec year 1895 10 -3 18 32 40 54 57 56 45.0 34.0 26.0 13.0 1896 4 -2 15 28 45 54 61 56 44.0 37.0 29.0 12.0 1897 7 18 23 27 46 49 63 62 45.0 39.0 24.0 16.0 1898 12 8 25 25 40 53 57 60 51.0 35.0 26.0 16.0 1899 6 -6 23 28 45 58 60 58 45.0 34.0 29.0 8.0
This average temperature dataset has the same structure as the previous datasets. It covers the period from 1895 to 2023, with the first column representing the years. The remaining columns indicate the average temperature for each month of the corresponding year.
#average_data_Phila = pd.read_csv("/content/drive/MyDrive/Jupiter/project-2/averageTempPhila.csv",index_col=0)
average_data_Phila = pd.read_csv("./averageTempPhila.csv",index_col=0)
#historical_average_data_Phila
print(average_data_Phila.head())
jan feb mar apr may jun jul aug sep oct nov dec year 1895 30.4 25.4 37.9 51.7 62.2 74.1 73.3 77.5 72.3 52.6 46.5 38.9 1896 31.0 33.6 35.9 55.3 67.2 70.3 77.5 76.6 67.8 53.6 50.4 34.7 1897 30.8 35.3 43.1 52.7 62.7 68.9 76.4 74.4 68.4 58.3 45.9 38.1 1898 35.3 35.6 48.0 49.5 61.2 72.4 78.2 76.8 71.4 58.6 44.6 35.9 1899 32.3 28.1 40.7 53.4 63.3 74.9 76.7 74.8 67.0 58.6 46.3 37.5
Our interesting area for weather analysis are:
In order to get current pseudotemperature for US states we invoke the getAirportDataCurrent function with our list of states and airports as input.
current_dat = getAirportDataCurrent(airports=ports)
current_dat[0:5]
[['AL', 86.0, -86.75, 33.57, '2024-06-08T19:53:00+00:00'], ['AK', 75.92, -134.58, 58.37, '2024-06-08T19:53:00+00:00'], ['AZ', 100.94, -112.02, 33.43, '2024-06-08T18:51:00+00:00'], ['AR', 91.94, -92.23, 34.72, '2024-06-08T19:53:00+00:00'], ['CA', 80.96000000000001, -121.58, 38.7, '2024-06-08T19:53:00+00:00']]
Then, python package Pandas is used to format as DataFrame
map_data_current = pd.DataFrame(current_dat, columns = ["state","temp","lat","lon",'timestamp'])
map_data_current.head()
| state | temp | lat | lon | timestamp | |
|---|---|---|---|---|---|
| 0 | AL | 86.00 | -86.75 | 33.57 | 2024-06-08T19:53:00+00:00 |
| 1 | AK | 75.92 | -134.58 | 58.37 | 2024-06-08T19:53:00+00:00 |
| 2 | AZ | 100.94 | -112.02 | 33.43 | 2024-06-08T18:51:00+00:00 |
| 3 | AR | 91.94 | -92.23 | 34.72 | 2024-06-08T19:53:00+00:00 |
| 4 | CA | 80.96 | -121.58 | 38.70 | 2024-06-08T19:53:00+00:00 |
In order to get current pseudotemperature for US states we use the getAirportDataHistoric function on our list of states and airports.
#this code will take a minute or two
historical = getAirportDataHistorical(ports)
rows = [i[0] for i in historical[0][0]['data']]
historical_data = pd.DataFrame(index=rows)
count = 0
for i in historical:
temp = pd.DataFrame([x[1] for x in i[0]['data']])
historical_data[ports[count][1]]= [x[1] for x in i[0]['data']]
count+=1
historical_data = historical_data.replace("M",np.nan)
historical_data.index = pd.to_datetime(historical_data.index)
#print(historical_data.min().name)
Here we analyze all non-null values in the historical data as a list of values. First looking at the percentage of missing data. 30% missing is the effect of observation stations which did not exist earlier in the timeline.
numeric = historical_data.to_numpy()
#remove the null observation for analysis
vals=[i[~pd.isnull(i)] for i in numeric]
vals=[float(item) for sublist in vals for item in sublist]
print("% missing values since 1895:",len(vals)/len(historical_data))
print("highest monthly avgt:",max(vals))
print("lowest monthly avgt:",min(vals))
% missing values since 1895: 30.684381075826312 highest monthly avgt: 102.74 lowest monthly avgt: -11.4
We have a dataset containing monthly temperatures for each year. Initially, it was challenging to calculate and plot the data points due to the arrangement of months and years. To address this, we used the "transpose" function to reverse the data frame, making it easier to work with. The transformed dataset is shown below:
new_average_data_Phila= average_data_Phila.transpose()
print(new_average_data_Phila.head())
year 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 ... 2014 \ jan 30.4 31.0 30.8 35.3 32.3 34.8 33.1 31.3 32.7 25.9 ... 28.3 feb 25.4 33.6 35.3 35.6 28.1 32.4 27.1 29.7 36.5 27.4 ... 32.1 mar 37.9 35.9 43.1 48.0 40.7 37.5 41.8 46.3 49.4 39.4 ... 38.9 apr 51.7 55.3 52.7 49.5 53.4 53.7 50.1 52.6 53.3 49.1 ... 53.8 may 62.2 67.2 62.7 61.2 63.3 63.0 60.5 62.5 65.6 65.5 ... 65.4 year 2015 2016 2017 2018 2019 2020 2021 2022 2023 jan 30.9 34.2 38.5 32.8 33.3 38.9 35.7 32.4 43.3 feb 25.8 38.6 44.2 41.9 37.1 40.8 34.2 39.7 42.7 mar 39.1 51.0 42.1 40.1 42.8 48.9 47.4 47.8 45.1 apr 55.4 54.5 59.5 50.4 59.0 51.6 55.6 54.0 58.3 may 70.1 63.0 63.1 67.7 66.4 61.4 64.1 67.5 62.4 [5 rows x 129 columns]
We also added three additional columns to the dataset to calculate the maximum, minimum, and mean temperatures for each month across all years from 1895 to 2023. These columns are named to reflect their respective calculations and the average temperature dataset they belong to:
We displayed these calculations below to verify their accuracy for the average temperature dataset. These values illustrate the range of average temperatures for each month. For example, in August, the average temperature ranges from 70 to 82 degrees Fahrenheit.
#ADD MAX, MIN, AVERAGE COLUMN
new_average_data_Phila = new_average_data_Phila.assign(
AverageUpperLimit=new_average_data_Phila.max(axis=1),
AverageLowerLimit=new_average_data_Phila.min(axis=1),
AverageAverageLimit=new_average_data_Phila.mean(axis=1).round())
print(new_average_data_Phila.iloc[:,-3:])
year AverageUpperLimit AverageLowerLimit AverageAverageLimit jan 46.2 20.0 33.0 feb 44.2 22.2 34.0 mar 52.5 32.6 43.0 apr 59.5 46.8 53.0 may 70.8 55.9 64.0 jun 78.2 65.8 72.0 jul 82.4 72.0 77.0 aug 81.3 70.2 76.0 sep 74.5 62.8 69.0 oct 64.5 51.8 58.0 nov 54.0 39.9 47.0 dec 51.2 25.5 37.0
For the minimum temperature dataset, we transposed the data and added three new columns: MinUpperLimit, MinLowerLimit, and MinAverageLimit. These columns represent the maximum, minimum, and average minimum temperatures for each month, respectively. Based on these values, we can determine the range of minimum temperatures for each month. For instance, in August, the minimum temperature ranges from 40 to 65 degrees Fahrenheit.
new_min_data_Phila= min_data_Phila.transpose()
#ADD MAX, MIN, AVERAGE COLUMN
new_min_data_Phila = new_min_data_Phila.assign(
MinUpperLimit=new_min_data_Phila.max(axis=1),
MinLowerLimit=new_min_data_Phila.min(axis=1),
MinAverageLimit=new_min_data_Phila.mean(axis=1).round())
#print(new_min_data_Phila)
print(new_min_data_Phila.iloc[:,-3:])
year MinUpperLimit MinLowerLimit MinAverageLimit jan 28.0 -7.0 9.0 feb 27.0 -11.0 11.0 mar 30.0 7.0 20.0 apr 40.0 14.0 31.0 may 51.0 28.0 42.0 jun 59.0 44.0 52.0 jul 67.0 51.0 59.0 aug 65.0 44.0 57.0 sep 58.0 35.0 47.0 oct 46.0 25.0 35.0 nov 34.0 14.0 25.0 dec 30.0 -4.0 15.0
For the maximum temperature dataset, we transposed the data and added three new columns: MaxUpperLimit, MaxLowerLimit, and MaxAverageLimit. These columns represent the highest, lowest, and average maximum temperatures for each month, respectively. Based on these values, we can determine the range of maximum temperatures for each month. For instance, in August, the maximum temperature ranges from 85 to 106 degrees Fahrenheit.
new_max_data_Phila= max_data_Phila.transpose()
#ADD MAX, MIN, AVERAGE COLUMN
new_max_data_Phila = new_max_data_Phila.assign(
MaxUpperLimit=new_max_data_Phila.max(axis=1),
MaxLowerLimit=new_max_data_Phila.min(axis=1),
MaxAverageLimit=new_max_data_Phila.mean(axis=1).round())
#print(new_min_data_Phila)
print(new_max_data_Phila.iloc[:,-3:])
year MaxUpperLimit MaxLowerLimit MaxAverageLimit jan 74.0 44.0 60.0 feb 79.0 41.0 61.0 mar 87.0 54.0 73.0 apr 95.0 73.0 83.0 may 97.0 80.0 89.0 jun 102.0 84.0 94.0 jul 104.0 89.0 96.0 aug 106.0 85.0 94.0 sep 100.0 81.0 90.0 oct 96.0 73.0 82.0 nov 84.0 60.0 72.0 dec 73.0 50.0 62.0
We combined all the calculated data from the three datasets into a new data frame to support comprehensive visualization of Philadelphia's monthly climate changes. This new data frame allows us to see the range of maximum, minimum, and average temperatures for each month. For example, in August, temperatures range from 44 to 106 degrees Fahrenheit, with an average temperature of 76 degrees Fahrenheit.
month_Phila = new_average_data_Phila.iloc[:,-3:]
month_Phila = month_Phila.assign(
MaxUpperLimit=new_max_data_Phila.iloc[:,-3],
MaxLowerLimit=new_max_data_Phila.iloc[:,-2],
MaxAverageLimit=new_max_data_Phila.iloc[:,-1])
month_Phila = month_Phila.assign(
MinUpperLimit=new_min_data_Phila.iloc[:,-3],
MinLowerLimit=new_min_data_Phila.iloc[:,-2],
MinAverageLimit=new_min_data_Phila.iloc[:,-1])
print(month_Phila)
year AverageUpperLimit AverageLowerLimit AverageAverageLimit \ jan 46.2 20.0 33.0 feb 44.2 22.2 34.0 mar 52.5 32.6 43.0 apr 59.5 46.8 53.0 may 70.8 55.9 64.0 jun 78.2 65.8 72.0 jul 82.4 72.0 77.0 aug 81.3 70.2 76.0 sep 74.5 62.8 69.0 oct 64.5 51.8 58.0 nov 54.0 39.9 47.0 dec 51.2 25.5 37.0 year MaxUpperLimit MaxLowerLimit MaxAverageLimit MinUpperLimit \ jan 74.0 44.0 60.0 28.0 feb 79.0 41.0 61.0 27.0 mar 87.0 54.0 73.0 30.0 apr 95.0 73.0 83.0 40.0 may 97.0 80.0 89.0 51.0 jun 102.0 84.0 94.0 59.0 jul 104.0 89.0 96.0 67.0 aug 106.0 85.0 94.0 65.0 sep 100.0 81.0 90.0 58.0 oct 96.0 73.0 82.0 46.0 nov 84.0 60.0 72.0 34.0 dec 73.0 50.0 62.0 30.0 year MinLowerLimit MinAverageLimit jan -7.0 9.0 feb -11.0 11.0 mar 7.0 20.0 apr 14.0 31.0 may 28.0 42.0 jun 44.0 52.0 jul 51.0 59.0 aug 44.0 57.0 sep 35.0 47.0 oct 25.0 35.0 nov 14.0 25.0 dec -4.0 15.0
We calculated the maximum, minimum, and average temperatures for each year in Philadelphia's maximum temperature dataset. Three new columns, MaxUpperLimit, MaxLowerLimit, and MaxAverageLimit, were added to the end of the data frame. These columns indicate the temperature range for each year and support data visualization and linear regression modeling.
max_data_Phila = max_data_Phila.assign(
MaxUpperLimit=max_data_Phila.max(axis=1),
MaxLowerLimit=max_data_Phila.min(axis=1),
MaxAverageLimit=round(max_data_Phila.mean(axis=1),1))
print(max_data_Phila.head())
jan feb mar apr may jun jul aug sep oct nov dec \
year
1895 55 53 64 84 94 97 94 98 97.0 74.0 74.0 66.0
1896 53 60 63 93 93 91 93 97 92.0 74.0 73.0 60.0
1897 61 53 68 85 82 91 94 90 96.0 88.0 71.0 63.0
1898 58 61 72 78 91 94 100 93 96.0 83.0 66.0 60.0
1899 56 57 70 80 89 97 96 95 87.0 80.0 65.0 65.0
MaxUpperLimit MaxLowerLimit MaxAverageLimit
year
1895 98.0 53.0 79.2
1896 97.0 53.0 78.5
1897 96.0 53.0 78.5
1898 100.0 58.0 79.3
1899 97.0 56.0 78.1
For minimum temperature dataset, we added three columns (MinUpperLimit, MinLowerLimit and MinAverageLimit) at the end of dataframe.
min_data_Phila = min_data_Phila.assign(
MinUpperLimit=min_data_Phila.max(axis=1),
MinLowerLimit=min_data_Phila.min(axis=1),
MinAverageLimit=round(min_data_Phila.mean(axis=1),1))
print(min_data_Phila.head())
jan feb mar apr may jun jul aug sep oct nov dec \
year
1895 10 -3 18 32 40 54 57 56 45.0 34.0 26.0 13.0
1896 4 -2 15 28 45 54 61 56 44.0 37.0 29.0 12.0
1897 7 18 23 27 46 49 63 62 45.0 39.0 24.0 16.0
1898 12 8 25 25 40 53 57 60 51.0 35.0 26.0 16.0
1899 6 -6 23 28 45 58 60 58 45.0 34.0 29.0 8.0
MinUpperLimit MinLowerLimit MinAverageLimit
year
1895 57.0 -3.0 31.8
1896 61.0 -2.0 31.9
1897 63.0 7.0 34.9
1898 60.0 8.0 34.0
1899 60.0 -6.0 32.3
For average temperature dataset, we added three columns (AverageUpperLimit, AverageLowerLimit and AverageAverageLimit) at the end of dataframe.
average_data_Phila = average_data_Phila.assign(
AverageUpperLimit=average_data_Phila.max(axis=1),
AverageLowerLimit=average_data_Phila.min(axis=1),
AverageAverageLimit=round(average_data_Phila.mean(axis=1),1))
print(average_data_Phila.head())
jan feb mar apr may jun jul aug sep oct nov dec \
year
1895 30.4 25.4 37.9 51.7 62.2 74.1 73.3 77.5 72.3 52.6 46.5 38.9
1896 31.0 33.6 35.9 55.3 67.2 70.3 77.5 76.6 67.8 53.6 50.4 34.7
1897 30.8 35.3 43.1 52.7 62.7 68.9 76.4 74.4 68.4 58.3 45.9 38.1
1898 35.3 35.6 48.0 49.5 61.2 72.4 78.2 76.8 71.4 58.6 44.6 35.9
1899 32.3 28.1 40.7 53.4 63.3 74.9 76.7 74.8 67.0 58.6 46.3 37.5
AverageUpperLimit AverageLowerLimit AverageAverageLimit
year
1895 77.5 25.4 53.6
1896 77.5 31.0 54.5
1897 76.4 30.8 54.6
1898 78.2 35.3 55.6
1899 76.7 28.1 54.5
For data visualization through linear regression, we utilized the linregress function from the stats module of scipy. This function simplifies the process by taking x and y datasets and returning the slope and intercept, allowing us to use the linear equation y=ax+b. This enables predictions of temperature trends for the next 100 years based on data from 1895 to 2023. However, it is important to note that while linear regression can project future trends, its predictions may not be entirely accurate.
x = list(average_data_Phila.T.head(0))
y = list(max_data_Phila['MaxUpperLimit'])
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
print("Our linear regression model for maximum temperature is: y =" ,
round(slope,4), "* x + ", round(intercept,2))
print("Our predicted maximum temperature for 2150 is: ",
myfunc(2150), "degree F")
Our linear regression model for maximum temperature is: y = -0.0008 * x + 98.99 Our predicted maximum temperature for 2150 is: 97.34208966905189 degree F
Based on our analaytical data in weather, we try to invetigate in weather trending that will support for our weather forcast application:
sns.histplot(map_data_current["temp"],kde=False,color="lightgrey",edgecolor="black")
plt.xlabel("Degrees Fahrenheit")
plt.ylabel("# of Data Points")
plt.title("Current temperatures US States; n=50")
Text(0.5, 1.0, 'Current temperatures US States; n=50')
The plot above displays the distribution of temperatures across 50 US states. The X-axis represents temperatures in degrees Fahrenheit, ranging from 60°F to 100°F, while the Y-axis shows the number of data points, from 0 to 14. The histogram reveals that the most frequent temperature range is 70-75°F, with around 14 data points. There are also smaller peaks in the 60-65°F, 75-80°F, 80-85°F, and 85-90°F ranges, each having approximately 6 to 8 data points. Fewer data points are observed in the 90-95°F range, and the least frequent range is 95-100°F, with about 2 data points. This visualization provides an overview of the current temperature distribution across US states.
fig = go.Figure()
fig.add_trace(go.Choropleth(
locations=map_data_current["state"],
locationmode="USA-states",
name="",
z=map_data_current['temp']))
fig.add_trace(go.Scattergeo(
lat=map_data_current["lon"],
lon=map_data_current["lat"],
mode="markers",
name="",
marker_color="white"))
fig.update_layout(autosize=True,
height=650,
width=900,
title = 'Current temperature US States in Fahrenheit at: '+map_data_current['timestamp'][0],
geo=dict(scope='usa'))
fig
The a heat map inidcates the current temperatures across the United States. The map uses a color gradient to represent temperature ranges, with purple indicating lower temperatures (around 65°F) and yellow indicating higher temperatures (up to 100°F). The map shows a clear regional temperature distribution, with the western and northeastern parts of the country experiencing cooler temperatures (purple and blue shades), while the central and southeastern regions are warmer (orange and yellow shades). White dots scattered across the map likely represent specific temperature measurement points. This visualization provides a snapshot of temperature variations across the US at a given time.
We have attached the heatmap image below as a reference in case the script does not execute properly.
Timeseries of Monthly Average Temperatures for each US State
Here we show the historical patterns for every pseudotemperature available from 1895 in timeseries. Some locations have all data avaible. Most locations are accounted for after 1960s
sns.histplot(vals,kde=False,color="lightgrey",line_kws={'color':"black"})
#plt.yscale('log')
plt.title("Histogram of All Available Monthly Average Values Since 1895 n=47,279")
Text(0.5, 1.0, 'Histogram of All Available Monthly Average Values Since 1895 n=47,279')
The histogram depicts the distribution of monthly average temperature values from 1895 onwards. The x-axis represents the temperature in degrees Fahrenheit, ranging from 0 to 100°F, while the y-axis shows the count of temperature occurrences. The histogram displays a bell-shaped distribution, with most temperatures falling between 40°F and 80°F. The peak of the distribution occurs around 70°F, indicating that this temperature range has the highest frequency. The count gradually decreases towards the lower and higher ends of the temperature spectrum, suggesting fewer occurrences of extreme temperatures. Overall, the histogram provides a comprehensive overview of historical temperature data, highlighting the most common monthly average temperatures over the specified period.
#plt.figure(figsize=(34,3))
for i in ports:
sns.lineplot(pd.to_numeric(historical_data[i[1]][~pd.isnull(historical_data[i[1]])]),
linewidth=1,alpha=.5)
plt.title("Monthly Average Temperature Fahrenheit Stratified by US State n=50")
plt.ylabel("Temp Fahrenheit")
Text(0, 0.5, 'Temp Fahrenheit')
The time series plot displays the monthly average temperatures in Fahrenheit for all 50 US states from 1895 to 2023. The x-axis spans the years from 1895 to 2023, and the y-axis ranges from 0°F to 100°F. Each vertical line represents the monthly average temperature for a specific year and state. Early years (1895-1920) show significant temperature variability, while post-1940 data appears more stable yet still exhibits seasonal variations. The plot uses color coding to represent different states, showing an overall upward trend in temperatures over the years, indicating an increase in average monthly temperatures across the United States.
Addition of an average US temp line w/ regression.For this project we will be computing US monthly avg temperature as the mean of available average temperatures at the timepoint
averages=[]
for i in range(len(rows)):
timepoint = historical_data.iloc[[i]].values[0]
time=[float(j) for j in timepoint if pd.isnull(j)==False]
averages.append(sum(time)/len(time))
averages_df = pd.DataFrame(index=rows)
averages_df['averages']=averages
averages_df
averages_df.index = pd.to_datetime(averages_df.index)
averages_df
| averages | |
|---|---|
| 1895-01-01 | 24.160000 |
| 1895-02-01 | 25.052500 |
| 1895-03-01 | 42.130000 |
| 1895-04-01 | 56.067500 |
| 1895-05-01 | 61.892500 |
| ... | ... |
| 2023-03-01 | 45.029388 |
| 2023-04-01 | 55.845510 |
| 2023-05-01 | 65.336531 |
| 2023-06-01 | 72.242653 |
| 2023-07-01 | 78.820204 |
1543 rows × 1 columns
Now that we have the average of monthly average temperatures in the US we will try to predict future data. First, split the average averages datast into train and test
plt.figure(figsize=(24,4))
train = averages_df[averages_df.index < pd.to_datetime("2010-11-01", format='%Y-%m-%d')]
train
test = averages_df[averages_df.index >= pd.to_datetime("2010-11-01", format='%Y-%m-%d')]
test
plt.plot(train['averages'], color = "black",label="train")
plt.plot(test['averages'], color = "red",label="test")
plt.legend()
<matplotlib.legend.Legend at 0x125578ed510>
The image depicts a time series plot showing the average monthly temperatures for Philadelphia from 1895 to 2023, divided into training and test datasets. The x-axis represents the years, while the y-axis shows the temperature in Fahrenheit, ranging from 20°F to 80°F. The black line indicates the training dataset, covering a period from 1895 to around 2010, illustrating regular, cyclical temperature variations with clear seasonal patterns. The red line represents the test dataset, spanning from around 2010 to 2023, which continues the temperature trends observed in the training dataset. The plot highlights the consistency in temperature patterns over the years, with the test dataset closely following the established trends from the training data.
We want to see if a linear model can predict future data. In order to use dates in a model we need to transform them to Ordinal Date which is a numeric representation of date. To achieve this we will transform datetime objects with datetime.toordinal()
averages_df["date"] = averages_df.index
averages_df['ordinal_date']= pd.to_datetime(averages_df['date'])
averages_df['ordinal_date']=averages_df['ordinal_date'].apply(lambda x: x.toordinal())
averages_df
| averages | date | ordinal_date | |
|---|---|---|---|
| 1895-01-01 | 24.160000 | 1895-01-01 | 691770 |
| 1895-02-01 | 25.052500 | 1895-02-01 | 691801 |
| 1895-03-01 | 42.130000 | 1895-03-01 | 691829 |
| 1895-04-01 | 56.067500 | 1895-04-01 | 691860 |
| 1895-05-01 | 61.892500 | 1895-05-01 | 691890 |
| ... | ... | ... | ... |
| 2023-03-01 | 45.029388 | 2023-03-01 | 738580 |
| 2023-04-01 | 55.845510 | 2023-04-01 | 738611 |
| 2023-05-01 | 65.336531 | 2023-05-01 | 738641 |
| 2023-06-01 | 72.242653 | 2023-06-01 | 738672 |
| 2023-07-01 | 78.820204 | 2023-07-01 | 738702 |
1543 rows × 3 columns
Now we will use the averages and the ordinal dates as input to the model
Predict future values using linear regression
slope_h, intercept_h, r_h, p_h, std_err_h = stats.linregress(averages_df['ordinal_date'],averages_df['averages'])
def myfunc_h(x):
return slope_h * x + intercept_h
y_pred_df_lm = pd.DataFrame(index=test.index)
y_pred_df_lm["date"] = y_pred_df_lm.index
y_pred_df_lm['date']= pd.to_datetime(y_pred_df_lm['date'])
y_pred_df_lm['ordinal_date']=y_pred_df_lm['date'].apply(lambda x: x.toordinal())
y_pred_df_lm['prediction']=y_pred_df_lm['ordinal_date'].apply(lambda x: myfunc_h(x))
y_pred_df_lm
linear_regression_rmse = np.sqrt(mean_squared_error(test["averages"].values,
y_pred_df_lm["prediction"]))
print("RMSE: ",linear_regression_rmse)
print("slope:",slope_h)
RMSE: 15.27092511726122 slope: 0.0001406182661936425
Since the linear regression is not performing great we seek out another solution. Autoregressive models predict future values from past data. In this example we use past temperature averages to forecast future averages
model = ARIMA(train['averages'], order=(1,1,1),seasonal_order=(1,1,1,12))
model_fit = model.fit()
plt.figure(figsize=(24,4))
y_pred = model_fit.get_forecast(len(test.index))
y_pred_df = y_pred.conf_int(alpha = 0.05)
y_pred_df["Predictions"] = model_fit.predict(start = y_pred_df.index[0], end = y_pred_df.index[-1])
y_pred_df.index = test.index
y_pred_out = y_pred_df["Predictions"]
plt.plot(train['averages'], color = "black",label= 'train set')
plt.plot(test['averages'], color = "red",label= "test set")
plt.plot(y_pred_df_lm['prediction'], color='yellow', label = 'linear regression')
plt.plot(y_pred_out, color='green', label = 'ARIMA Predictions')
plt.legend()
<matplotlib.legend.Legend at 0x1255792d9d0>
The time series plot illustrates the average monthly temperatures in Philadelphia from 1895 to 2023. The x-axis represents the years, while the y-axis shows the temperature in Fahrenheit, ranging from 20°F to 80°F. The black line denotes the training dataset, covering the period from 1895 to around 2010, displaying regular, cyclical temperature patterns indicative of seasonal changes. The red line represents the test dataset, extending from around 2010 to 2023, and it continues the temperature trends observed in the training data. Additionally, the plot includes a yellow line indicating a linear regression model, and a green line representing ARIMA predictions, both overlaid on the test data. These models are used to predict future temperatures, showing how they align with the actual observed values in the test set. The combination of these elements highlights the consistency and predictability of temperature patterns over the years.
#we show that we can fit a better model and decrease the RMSE with the Autoregression model compared to linear regression
arma_rmse = np.sqrt(mean_squared_error(test["averages"].values, y_pred_df["Predictions"]))
print("RMSE: ",arma_rmse)
RMSE: 2.2812186810309716
Model predictions:
"Predicted monthly average temperature in US 1970, 2150, and 2250"
s1 = pd.to_datetime("1970-1-01", format='%Y-%m-%d')
e1 = pd.to_datetime("1970-12-01", format='%Y-%m-%d')
print(model_fit.predict(start=s1, end=e1))
s = pd.to_datetime("2150-1-01", format='%Y-%m-%d')
e = pd.to_datetime("2150-12-01", format='%Y-%m-%d')
print(model_fit.predict(start=s, end=e))
s3 = pd.to_datetime("2250-1-01", format='%Y-%m-%d')
e3 = pd.to_datetime("2250-12-01", format='%Y-%m-%d')
print(model_fit.predict(start=s3, end=e3))
1970-01-01 31.681011 1970-02-01 33.872073 1970-03-01 42.572465 1970-04-01 53.095679 1970-05-01 62.350186 1970-06-01 70.948309 1970-07-01 75.477208 1970-08-01 74.027748 1970-09-01 67.096961 1970-10-01 56.878593 1970-11-01 44.644538 1970-12-01 35.037239 Freq: MS, Name: predicted_mean, dtype: float64 2150-01-01 39.977354 2150-02-01 43.014572 2150-03-01 51.465119 2150-04-01 60.819545 2150-05-01 69.809393 2150-06-01 78.281231 2150-07-01 82.799590 2150-08-01 81.683886 2150-09-01 74.451104 2150-10-01 63.029776 2150-11-01 52.453464 2150-12-01 42.634329 Freq: MS, Name: predicted_mean, dtype: float64 2250-01-01 43.940857 2250-02-01 46.978075 2250-03-01 55.428623 2250-04-01 64.783048 2250-05-01 73.772896 2250-06-01 82.244734 2250-07-01 86.763093 2250-08-01 85.647389 2250-09-01 78.414607 2250-10-01 66.993279 2250-11-01 56.416967 2250-12-01 46.597832 Freq: MS, Name: predicted_mean, dtype: float64
According to Drexel News, Philadelphia's temperature is expected to exceed 90 degrees Fahrenheit by 2025. This intriguing projection prompted us to investigate the matter as it relates to our project. We began by examining the average temperature dataset to understand the temperature range for each month. In the graph below, the red line represents the maximum temperature for each month, which we refer to as the upper limit of average temperature. The green line indicates the minimum temperature for each month, referred to as the lower limit of average temperature. For instance, the average temperature range for August is from 70.2 to 81.3 degrees Fahrenheit. The plot below illustrates the average temperature limits for each month, showing the monthly average climate change from 1850 to 2023.
#print(new_average_data_Phila['AverageUpperLimit'])
print("For example, the maximum average temperature of August is ")
print(new_average_data_Phila['AverageUpperLimit']['aug'], "degree F")
For example, the maximum average temperature of August is 81.3 degree F
print("For example, the minimum average temperature of August is ")
print(new_average_data_Phila['AverageLowerLimit']['aug'],"degree F")
For example, the minimum average temperature of August is 70.2 degree F
new_average_data_Phila.plot(legend=None, marker='.', linestyle='none',
title='PHILADELPHIA MONTHLY AVERAGE TEMPERATURE \n FROM 1895 TO 2023')
new_average_data_Phila['AverageUpperLimit'].plot(color="red")
new_average_data_Phila['AverageLowerLimit'].plot(color="green")
plt.rcParams["figure.figsize"] = [8.6, 6.8]
plt.rcParams["figure.autolayout"] = True
plt.xlabel("MONTH")
plt.ylabel("TEMPERATURE \n DEGREE F")
plt.grid()
plt.show()
Theline graph indicates Philadelphia Monthly Average Temperature from 1895 to 2023. The X-axis represents the months from January to December, while the Y-axis represents the temperature in degrees Fahrenheit, ranging from 20°F to 80°F. The graph displays two prominent lines: a red line indicating the upper limit (maximum average temperature) and a green line indicating the lower limit (minimum average temperature) for each month. The temperature data points for each year are shown as colored dots along these lines, illustrating the range and distribution of average temperatures. For instance, in August, the average temperature ranges from about 70.2°F to 81.3°F. This visualization highlights the historical monthly average temperature variations in Philadelphia over the years.
print("For example, the minimum upper limit temperature of August is ")
print(new_min_data_Phila['MinUpperLimit']['aug'], "degree F")
For example, the minimum upper limit temperature of August is 65.0 degree F
print("For example, the minimum lower limit temperature of August is ")
print(new_min_data_Phila['MinLowerLimit']['aug'], "degree F")
For example, the minimum lower limit temperature of August is 44.0 degree F
new_min_data_Phila.plot(legend=None, marker='.', linestyle='none',
title='PHILADELPHIA MONTHLY MINIMUM TEMPERATURE \n FROM 1895 TO 2023')
new_min_data_Phila['MinUpperLimit'].plot(color="red")
new_min_data_Phila['MinLowerLimit'].plot(color="green")
plt.rcParams["figure.figsize"] = [8.6, 6.8]
plt.rcParams["figure.autolayout"] = True
plt.xlabel("MONTH")
plt.ylabel("TEMPERATURE \n DEGREE F")
plt.grid()
plt.show()
The line chart showing the historical monthly minimum temperatures in Philadelphia from 1895 to 2023. The x-axis represents the months of the year, from January to December, while the y-axis shows the temperature in degrees Fahrenheit, ranging from -10°F to 70°F. The red line represents the upper limit of the minimum temperatures recorded each month, and the green line represents the lower limit. The colored dots scattered along each month indicate individual data points for the minimum temperatures over the years. This chart helps visualize the range and variability of the monthly minimum temperatures, with noticeable increases during the summer months (June to August) and decreases during the winter months (December to February). The spread of the dots indicates the variability in minimum temperatures for each month over the long historical period.
The plot below shows the maximum temperature range in each month with maximum upper limit and maximum lower limit. It indicates the monthly maximum climate change in each month from 1850 to 2023. Therefore, the range of maximum temperature of August is from 85 to 106 degree F
print("For example, the Maximum Upper Limit temperature of August is ")
print(new_max_data_Phila['MaxUpperLimit']['aug'],"degree F")
For example, the Maximum Upper Limit temperature of August is 106.0 degree F
print("For example, the Maximum Lower Limit temperature of August is ")
print(new_max_data_Phila['MaxLowerLimit']['aug'],"degree F")
For example, the Maximum Lower Limit temperature of August is 85.0 degree F
new_max_data_Phila.plot(legend=None, marker='.', linestyle='none',
title='PHILADELPHIA MONTHLY MAXIMUM TEMPERATURE \n FROM 1895 TO 2023')
new_max_data_Phila['MaxUpperLimit'].plot(color="red")
new_max_data_Phila['MaxLowerLimit'].plot(color="green")
plt.rcParams["figure.figsize"] = [8.6, 6.8]
plt.rcParams["figure.autolayout"] = True
plt.xlabel("MONTH")
plt.ylabel("TEMPERATURE \n DEGREE F")
plt.grid()
plt.show()
The line chart illustrates the historical monthly maximum temperatures in Philadelphia from 1895 to 2023. The x-axis represents the months of the year, from January to December, and the y-axis indicates the temperature in degrees Fahrenheit, ranging from 40°F to 110°F. The red line shows the upper limit of the maximum temperatures recorded each month, while the green line displays the lower limit. Colored dots represent individual data points for the maximum temperatures over the years. This chart visually demonstrates the range and variability of the monthly maximum temperatures, with the highest temperatures occurring during the summer months (June to August) and the lowest during the winter months (December to February). The dispersion of the dots indicates the variation in maximum temperatures for each month throughout the long historical period, highlighting the broader range of maximum temperatures compared to minimum temperatures.
In the plot below, we present all calculated data from the minimum, maximum, and average temperature datasets as previously discussed. The red line represents the maximum temperature for each month, the green line represents the minimum temperature, and the yellow line represents the average temperature. This combined data visualization demonstrates the wide range of temperatures for each month. Based on these calculations, we can predict that the overall temperature in Philadelphia in August will be 76°F, although it can fluctuate widely from 44°F to 106°F.
print("The Maximum temperature of August is ")
print(month_Phila['MaxUpperLimit']['aug'],"degree F")
print("The Average temperature of August is ")
print(month_Phila['AverageAverageLimit']['aug'],"degree F")
print("The Minimum temperature of August is ")
print(month_Phila['MinLowerLimit']['aug'],"degree F")
The Maximum temperature of August is 106.0 degree F The Average temperature of August is 76.0 degree F The Minimum temperature of August is 44.0 degree F
month_Phila.plot(legend=None, marker='.', linestyle='none',
title='PHILADELPHIA MONTHLY TEMPERATURE \n FROM 1895 TO 2023',label="")
month_Phila['MaxUpperLimit'].plot(color="red",label ='Maximum Philadelphia Temperature')
month_Phila['MinLowerLimit'].plot(color="green",label ='Minimum Philadelphia Temperature')
month_Phila['AverageAverageLimit'].plot(color="yellow", label ='Average Philadelphia Temperature')
plt.rcParams["figure.figsize"] = [8.6, 6.8]
plt.rcParams["figure.autolayout"] = True
plt.xlabel("MONTH")
plt.ylabel("TEMPERATURE \n DEGREE F")
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.grid()
plt.show()
The plot displays the calculated data from the minimum, maximum, and average temperature datasets for Philadelphia from 1895 to 2023. The red line represents the maximum temperatures for each month, the green line shows the minimum temperatures, and the yellow line indicates the average temperatures. This combined visualization demonstrates the wide range of temperatures for each month. The scatter points show the upper and lower limits and the average limits for each temperature dataset. For example, the overall average temperature in August is predicted to be 76°F, but the temperature can range from as low as 44°F to as high as 106°F. This comprehensive data presentation helps in understanding the historical temperature variations and trends over the years.
Moreover, we examined the overall yearly maximum temperature of Philadelphia in a time series from 1895 to 2023. The plot illustrates the yearly maximum temperature in Philadelphia from 1895 to 2023. The yellow line represents the maximum average limit, showing an increase from below 80°F to above 80°F over the years. The red line indicates the maximum upper limit, consistently hovering around 100°F throughout the time series. The green line depicts the maximum lower limit, which has risen from 45°F to 70°F. The scatter points provide a detailed distribution of temperatures each year, demonstrating the variability and trends in Philadelphia's yearly maximum temperatures.
#max_data_Phila = max_data_Phila.assign(MaxUpperLimit=max_data_Phila.max(axis=1),MaxLowerLimit=max_data_Phila.min(axis=1),MaxAverageLimit=round(max_data_Phila.mean(axis=1),1))
max_data_Phila.plot(legend=None, marker='.', linestyle='none',
title='PHILADELPHIA YEARLY MAXIMUM TEMPERATURE \n FROM 1895 TO 2023')
max_data_Phila['MaxUpperLimit'].plot(color="red")
max_data_Phila['MaxLowerLimit'].plot(color="green")
max_data_Phila['MaxAverageLimit'].plot(color="yellow")
plt.xlabel("YEAR")
plt.ylabel("TEMPERATURE \n DEGREE F")
plt.grid()
plt.show()
We investigate time to calculate the overall yearly minimum temperature of Philadelphia in time series from 1895 to 2023. According to plot below, The plot presents the yearly minimum temperatures in Philadelphia from 1895 to 2023. The red line represents the maximum upper limit of minimum temperatures, consistently around 60°F. The yellow line indicates the average minimum temperature, showing a slight upward trend from about 30°F to above 35°F. The green line shows the minimum lower limit, which has increased from below 0°F to approximately 10°F. The scattered points illustrate the yearly distribution of minimum temperatures, highlighting the fluctuations and trends over time. This visualization provides a comprehensive overview of how the minimum temperatures in Philadelphia have changed over the past century.
#min_data_Phila = min_data_Phila.assign(MinUpperLimit=min_data_Phila.max(axis=1),MinLowerLimit=min_data_Phila.min(axis=1),MinAverageLimit=round(min_data_Phila.mean(axis=1),1))
min_data_Phila.plot(legend=None, marker='.', linestyle='none',
title='PHILADELPHIA YEARLY MINIMUM TEMPERATURE \n FROM 1895 TO 2023')
min_data_Phila['MinUpperLimit'].plot(color="red")
min_data_Phila['MinLowerLimit'].plot(color="green")
min_data_Phila['MinAverageLimit'].plot(color="yellow")
plt.xlabel("YEAR")
plt.ylabel("TEMPERATURE \n DEGREE F")
plt.grid()
plt.show()
We also investigate time to calculate the overall yearly average temperature of Philadelphia in time series from 1895 to 2023. According to plot below,The plot illustrates the yearly average temperatures in Philadelphia from 1895 to 2023. The red line denotes the upper limit of the maximum average temperatures, consistently hovering around 80°F. The yellow line represents the average yearly temperature, which shows a slight increasing trend from around 55°F to 60°F over the period. The green line indicates the lower limit of the minimum average temperatures, which has risen from below 30°F to approximately 40°F. The scatter points display the yearly temperature distribution, capturing the variability and trends over the past century. This comprehensive visualization highlights the gradual increase in average temperatures in Philadelphia, reflecting broader climatic changes over time.
#average_data_Phila = average_data_Phila.assign(AverageUpperLimit=average_data_Phila.max(axis=1),AverageLowerLimit=average_data_Phila.min(axis=1),AverageAverageLimit=round(average_data_Phila.mean(axis=1),1))
average_data_Phila.plot(legend=None, marker='.', linestyle='none',
title='PHILADELPHIA YEARLY AVERAGE TEMPERATURE \n FROM 1895 TO 2023')
average_data_Phila['AverageUpperLimit'].plot(color="red")
average_data_Phila['AverageLowerLimit'].plot(color="green")
average_data_Phila['AverageAverageLimit'].plot(color="yellow")
plt.xlabel("YEAR")
plt.ylabel("TEMPERATURE \n DEGREE F")
plt.grid()
plt.show()
By combining the three datasets, we observed overlapping temperature data points in Philadelphia from 1895 to 2023. Maximum Average Temperature and Average Upper Limit Temperature overlap between 70 to 85°F, while Average Temperature, Minimum Upper Limit Temperature, and Maximum Lower Limit Temperature overlap between 50 to 65°F. Minimum Average Temperature and Average Lower Limit Temperature overlap between 20 to 45°F. The Maximum Temperature (95 to 105°F) and Minimum Temperature (-10 to 20°F) do not overlap with other ranges. The plot below illustrates yearly temperature trends, showing consistent maximum temperatures around 100°F, maximum average temperatures around 80°F, average temperatures around 55°F, and minimum temperatures fluctuating between 0°F and 20°F. This visualization highlights the broad range of temperature fluctuations and trends, capturing both extremes and average conditions in Philadelphia's climate.
x = list(average_data_Phila.T.head(0))
y = list(average_data_Phila['AverageAverageLimit'])
yMin = list(min_data_Phila['MinLowerLimit'])
yMax = list(max_data_Phila['MaxUpperLimit'])
plt.scatter(x, yMax, label ='Maximum Temperature', color="red")
plt.scatter(x, list(max_data_Phila['MaxAverageLimit']), label ='Maximum Average Temperature')
plt.scatter(x, list(max_data_Phila['MaxLowerLimit']), label ='Maximum Lower Limit Temperature')
plt.scatter(x, list(average_data_Phila['AverageUpperLimit']), label ='Average Upper Limit Temperature')
plt.scatter(x, y, label ='Average Temperature', color = "yellow")
plt.scatter(x, list(average_data_Phila['AverageLowerLimit']), label ='Average Lower Limit Temperature')
plt.scatter(x, list(min_data_Phila['MinUpperLimit']), label ='Minimum Uppler Limit Temperature')
plt.scatter(x, list(min_data_Phila['MinAverageLimit']), label ='Minimum Average Temperature')
plt.scatter(x, yMin, label ='Minimum Temperature', color="green")
plt.xlabel("YEAR")
plt.ylabel("TEMPERATURE \n DEGREE F")
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.title('PHILADELPHIA YEARLY TEMPERATURE \n FROM 1895 TO 2023')
plt.grid()
plt.show()
x = list(average_data_Phila.T.head(0))
y = list(average_data_Phila['AverageAverageLimit'])
yMin = list(min_data_Phila['MinLowerLimit'])
yMax = list(max_data_Phila['MaxUpperLimit'])
slope, intercept, r, p, std_err = stats.linregress(x, y)
slopeMax, interceptMax, rMax, pMAx, std_err_Max = stats.linregress(x, yMax)
slopeMin, interceptMin, rMin, pMin, std_err_Min = stats.linregress(x, yMin)
def myfunc(x):
return slope * x + intercept
def myfuncMax(x):
return slopeMax * x + interceptMax
def myfuncMin(x):
return slopeMin * x + interceptMin
x1 = list(range(1895,2150))
mymodel = list(map(myfunc, x1))
mymodelMax = list(map(myfuncMax, x1))
mymodelMin = list(map(myfuncMin, x1))
plt.scatter(x, yMax, label ='Maximum Temperature', color="red")
plt.scatter(x, yMin, label ='Minimum Temperature', color="green")
plt.scatter(x, y, label ='Average Temperature', color = "yellow")
plt.plot(x1, mymodel,linestyle='dashed',color="yellow",label ='Average Model')
plt.plot(x1, mymodelMax, linestyle='dashed',color="red",label ='Maximum Model')
plt.plot(x1, mymodelMin, linestyle='dashed',color="green",label ='Minimum Model')
plt.xlabel("YEAR")
plt.ylabel("TEMPERATURE \n DEGREE F")
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.title('PHILADELPHIA YEARLY TEMPERATURE FROM 1895 TO 2023 \n AND LINEAR REGRESSION MODEL FROM 1895 TO 2150')
plt.grid()
plt.show()
print("Our predicted average temperature for 2150 is: ",round(myfunc(2150),2), "degree F")
print("Our predicted maximum temperature for 2150 is: ",round(myfuncMax(2150),2), "degree F")
print("Our predicted minimum temperature for 2150 is: ",round(myfuncMin(2150),2), "degree F")
Our predicted average temperature for 2150 is: 58.71 degree F Our predicted maximum temperature for 2150 is: 97.34 degree F Our predicted minimum temperature for 2150 is: 11.35 degree F
The linear regression model in the time-series plot illustrates Philadelphia's yearly temperatures from 1895 to 2023, along with projections extending to 2150. The red dots represent maximum temperatures, consistently around 100°F, with a red dashed line indicating minimal change in future maximum temperatures. The yellow dots show average temperatures, around 55-60°F, with a yellow dashed line suggesting a slight upward trend in future average temperatures. The green dots depict minimum temperatures, fluctuating around 0-20°F, with a green dashed line predicting a gradual increase in future minimum temperatures. This comprehensive visualization highlights historical temperature trends and provides projections for future temperatures based on linear regression models. It supports the Drexel article's claim that "warming temperatures may cause an increase in hot days," indicating that climate change is occurring in Philadelphia.
Initially, we had struggled with API access permission and subscription. We only request 1000 access per time for free account. However, we found NOAA and ACIS provided free API to load. Therefore, we can obtain the historical data in order to perform time series analysis from API.
Another limitation of this project will be interval of data collection. Too small time intervals may be too much data and too big intervals may miss important details. The previous API can provide historical data up to 40 years. For this issue, we also have a solution from NOAA. API can provide a wide range time intervals from 1895 up to currently. However, API is really large file size for loading. NOAA also provide another method to download CSV dataset from website with wide time frame from 1895 to 2023. The dataset include the accurate data point without any empty or missing value.
For data selection, we competed the ACIS API and NOAA dataset. Both dataset had pro and cons. Like ACIS API, the disadvantage is too long loading time with large file size. Like NOAA API, the disadvantage is too condense information to analyze.
We think that a lot of this data could be used for continued analysis including temperatures, pressures, humidities. A continued analysis of this data would be interesting if there is a significant difference between values across selected locations. For example, a significant difference in temperature opens the doors to interesting questions like why is temperature different across locations? Building upon an observation and telling the story of why and how.
We also think how to combine both API and NOAA dataset together to get the good data source for us to analyze and develop an accurate weather application for US state and cities. It is a interesting project for us to build up our data interpretation and data analysis